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IN THE CLAIMS 

Please amend the claims as follows: 

1 . (currently amended) A method for browser-enhanced web crawling associated with a 
network of hub processing units coupled to a plurality of information processing units 
over a network, the method executed by a web crawler on a hub processing unit 
associated with the network comprising th o ctopo of : 

retrieving a web document at an address , and extracting contents of the web 
document for rendering an intermediate dynamically constructed in-memorv webpaqe 
representation of the web document at a hub processing unit which is formatted as if 
displayed for viewing on an end-user's web browser : 

loading secondary documents associated with the web document in order to 
render the secondary documents as part of the in-memorv webpaoe representation, 
wherein the secondary documents include one or more images with textual content 
embedded therein : 

oonding to ono or moro information proc e ssing units a browser sido script to 
gath e r m e tadata; and p o rforming tho sub ctopc of: 
producing a final HTML markup; 

analyzing and summarizing the in-memorv weboage representation to produce a 
text map for the web page document of the textual contents therein fino l HTML markup 
to produce motadata : and 

using optical character recognition on the images to extract textual content for 
adding to the textual map for the weboage document . 

2. (currently amended) The method as defined in claim 1 , wherein the retrieving athe 
web document at an address etep-further comprises retrieving a document at an 
address selected from the group of addresses consisting of a nodal address, a network 
address, a URL and equivalents. 
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3. (currently amended) The method as defined in claim 1 , wherein the analyzing and 
oummarizing stop furth o r compris es analyz i n g a n d summarizing tho whole- an d 
c ompl e te d oc umen t ™* ™ mora images w ith textual content embedded therein include 
at least one of an in-line GIF image an d an in-line JPEG image. 

4. (currently amended) The method as defined in claim 1, furthor comprising tho stop of 
analyzing any imago data prosont in tho dooumont and any imago dato pr o sont i n the 
docum c nt c u tiliz i ng ?ptr-' rhnmrtor mnngnit i on techniques wherein the loading 
secondary documents further comprises t he loading of secondary documents including 
one or more Java applets with textual c ontent embedded therein. 

5. (currently amended) The method as defined in claim 1 , wherein the ©tep-ef-loading 
secondary documents further comprises the loading of secondary documents including 
web documents selected from the group of documents consisting of in-line frames, 
frames, and, i mogos, imago maps, apploto, audio, vidoo or equivalents. 

6. (currently amended) The method as defined in claim 4, wherein the stop of analyzin g 
any i mago data prccro n t i n \h r *wnmnnt nnri nny imnnn dato prosont in tho documonts 
utilizing opt i cal character recognition tochniquoc furth o r comprises analyzing any 
images and imago maps In tho imago data to produc o text data loading secondary 
documents ft^r comprises the loading n f secondary documents including one or 
more Java Script components with textual cont ent embedded therein. 



ARC9-2000-0046-US1 3 09/607.370 



PACE 6/24*RCVD 



06/23/2004 17:57 561-989-9812 KAIN ET 



7. (currently amended) The method as defined in claim 1, wherein the retrieving steethe 
web document further comprises performing the following sub-steps of: 

initializing a first list with seed values; 

checking if there are any URLs to be processed and if thoro aro, in response that 
anv URL exists to be processed then performing the following o ocondarv s ub-steps of: 
determining if a URL is in a second list; and , In response that a URL is not 
in the second list if i t io not I n tho cooond list; then performing the following 
teftiafy-sub-steps of: 

inserting the URL into the first list; 
scheduling the URL for crawling; 
crawling the URL when scheduled to do so; 
removing the URL from the first list after the scheduled crawling; 
entering the URL into the second list; and 
repeating the checking step until there are no more URLs to be 
processed; 

where if the determining step determines that the URL is in the second list then 
repeating the checking step until there are no more URLs to be processed. 

8. (original) The method as defined in claim 7, wherein the sub-step of initializing a first 
list with seed values further includes the list being a URL pool. 

9. (original) The method as defined In claim 7, wherein the sub-step of determining if a 
URL is in a second list further includes the second list being a visited pool. 

10. (currently amended) The method as defined in claim 7, wherein the tertiafy-sub-step 
of crawling further comprises the sub-steps of: 

issuing an HTTP command to a web server named in the URL; 

receiving contents of an HTML page as a result of the issued HTTP command; 

and 

passing on the contents of the HTML page to a Page Rendering subroutine. 
ARC9-2000-0046-US1 4 09/607,370 
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11. (original) The method as defined in claim 10, further including the sub-steps 
performed by the Page Rendering subroutine comprising: 

receiving the contents of the HTML page in the Page Rendering subroutine; 
building an in-memory representation of a Layout for the HTML page and if more 
data is needed to properly form the representation, then performing the sub-steps of: 

requesting additional web-based information; 

gathering this additional web-based information; 

inserting any URLs associated with this additional web-based information 
into the second list and a URL cache; 
building a final amended representation; and 

forwarding the final amended representation to an Extraction subroutine; 
wherein, if no more data is needed to properly form the in-memory representation, then 
forwarding the in-memory representation to the Extraction subroutine. 

12. (original) The method as defined in claim 1 1 , further including the sub-steps 
performed by the Page Extraction subroutine comprising: 

accessing a set of memory structures of the Page Renderer; 
copying a text portion of the structures into a text map; 

inspecting any in-line GIF and JPEG image references in the memory structures; 
extracting alternate text attributes; 
adding the alternate text attributes to a text map; 
invoking an optical character recognition engine; 
analyzing any in-line GIF and JPEG images using the optical character 
recognition engine for text content; 

extracting text content from the GIF and JPEG images; 
adding text content from the images to the text map; and 
forwarding the text map to a Page Summarizer subroutine. 

13. (original) The method as defined in claim 12, further including the sub-steps 
performed by the Page Summarizer subroutine comprising: 

ARC9-2000-0046-US1 5 09/607,370 
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receiving a text map from the Page Extractor subroutine; 
processing the text map in an application-specifio manner; 
applying data extraction patterns to the text map; 
translating resultant data from the applying step; 

larding any U*U present ,n me .ex, map to a manager subrou.,ne; and 
forwarding any extracted data and metadata to application logK, 

14 (current* amended) A computer readable medium inciuding programming 

programming instructions including ^^T^ZZ 
web crawling associated wUh a network of hub processing unrts coupled to a plural.ty 
r„formal processing units over a networK, the browser enhanced web crawlmg 
instructions on the computer readable medium compns,ng: 

relieving instructions for retrieving a web_document at an address^ 

"ding instructions for loading secondary documents ^e^Z^m^ 
aDar ^ ,„ ,n render him L. i ™nf l» r™rt >"« 

l , r - with fr~*'"»' ™" tftnt embedded therein; 

brewse^i d o o crip t t o gathefHffiaadafc* 

^ n m thP texh .nl - P webn ™° document . 
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15. (currently amended) The computer readable medium as defined in claim 14, 
wherein the retrieving instructions for retrieving a web document at an address further 
comprises retrieving instructions for retrieving a document at an address selected from 
the group of addresses consisting of a nodal address, a network address, a URL and 
equivalents. 

16. (currently amended) The computer readable medium as defined in claim 14, 
wherein the analyzing and summ a riz i ng instruct i ons furth e r compr i s e ana l yzing ond 
summar i zing instruct i ons for analyzing and summarizing th e w ho l e a nd comp l ete 
documen t one or more images with textual content embedded therein include at least 
one of an in-line GIF image and an in-line JPEG image. . 

17. (currently amended) The computer readable medium as defined in claim 14, further 
compris i ng imag e analyzing i nstruct i ons for analyz i ng any i m a go data pres e nt in tho 
docum e nt and any i mago dat e pr e s e nt i n tho docum e nts ut i l i zing opt i ca l character 
r e cognition tochniqu e s wherein the loading secondary documents further comprises the 
loading of secondary documents including one or more Java applets with textual 
content embedded therein . 

18. (currently amended) The computer readable medium as defined in claim 14, 
wherein the loading instructions for loading secondary documents further comprises 
loading instructions for loading of secondary documents including web documents 
selected from the group of documents consisting of in-line frames, frames, and imag e s. 
i mago maps, appl e ts, audio, vid e o or equivalents. 

19. (currently amended) The computer readable medium as defined in claim 17, 
wherein tho analyz i ng instruct i ons for ana l yzing any image data pros e nt i n th e 
docum e nt and any imag e dat e pres e nt in th e docum e nts utiliz i ng optical character 
recognition tochniqu e s further compris e s analyzing instruction s for analyzing any 
i mag e s and i mag e maps i n th e image data to produc e t e xt data loading secondary 
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documents further comprises the loading of secondary doc uments including one or 
more Java Script components with textual content emb edded therein, 

20. (currently amended) A browser-enhanced web crawling unit associated with a 
network of a plurality of hub processing units coupled to a plurality of information 
processing units over a network, the browser enhanced web crawling unit on a hub 
processing unit comprising: 

a retrieval unit for retrieving a web document at an address, and extracting 
contents of the web document for rendering an intermediate dyna mically constructed in- 
memory weboaae representation of the web document at a hub processing unit which is 
formatted as if displayed for viewing on an end-user's web browser : 

a loader for loading secondary documents as required associated with the web 
document in order to render the secondary documents as part of the rn-memory 
weboaoe representation, wherein the secondary docume nts include one or more 
images with textual content embedded therein ; 

an output for oonding to ono or moro informat i on prooosoing units a brows e r sid e 
script to gath o r motodata; 

a producor for produc i ng a fina l HTML markup; and 

a summarizer for analyzing and summarizing the in-memorv webpage 
representation to produce a text map for the web oaae document of th e textual contents 
therein f inal HTML markup to produce tho final metadata 

a producer for produc i ng a fina l HTML markup; 

an optical character recognition engine for use on the images to extract textual 
content for adding to the textual map for the webpag e document. 
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